Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations

نویسندگان

  • Xiaoyang Gao
  • Sriram Krishnamoorthy
  • Swarup Kumar Sahoo
  • Chi-Chung Lam
  • Gerald Baumgartner
  • J. Ramanujam
  • P. Sadayappan
چکیده

Compile-time optimizations involve a number of transformations such as loop permutation, fusion, tiling, array contraction, etc. Determination of the choice of these transformations that minimizes the execution time is a challenging task. We address this problem in the context of tensor contraction expressions involving arrays too large to fit in main memory. Domain-specific features of the computation are exploited to develop an integrated framework that facilitates the exploration of the entire search space of optimizations. In this paper, we discuss the exploration of the space of loop fusion and tiling transformations in order to minimize the disk I/O cost. These two transformations are integrated and pruning strategies are presented that significantly reduce the number of loop structures to be evaluated for subsequent transformations. The evaluation of the framework using representative contraction expressions from quantum chemistry shows a dramatic reduction in the size of the search space using the strategies presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pruning the Optimization Search Space Using Architecture-aware Cost Models⋆

In recent years, a number of strategies have emerged for empirically tuning applications to different architectures. Although quite successful for certain domains, empirical tuning is yet to gain wide acceptance as a viable strategy in high-performance computing. The principal bottleneck in this regard is the prohibitively large search space that needs to be explored in order to discover the be...

متن کامل

Memory-Constrained Data Locality Optimization for Tensor Contractions

The accurate modeling of the electronic structure of atoms and molecules involves computationally intensive tensor contractions over large multidimensional arrays. Efficient computation of these contractions usually requires the generation of temporary intermediate arrays. These intermediates could be extremely large, requiring their storage on disk. However, the intermediates can often be gene...

متن کامل

CHiLL: A Framework for Composing High-Level Loop Transformations

This paper describes a general and robust loop transformation framework that enables compilers to generate efficient code on complex loop nests. Despite two decades of prior research on loop optimization, performance of compiler-generated code often falls short of manually optimized versions, even for some well-studied BLAS kernels. There are two primary reasons for this. First, today’s compile...

متن کامل

Fusion: a General Framework for Hierarchical Tilings of R

We introduce a formalism for handling general spaces of hierarchical tilings, a category that includes substitution tilings, Bratteli-Vershik systems, S-adic transformations, and multi-dimensional cut-and-stack transformations. We explore ergodic, spectral and topological properties of these spaces. We show that familiar properties of substitution tilings carry over under appropriate assumption...

متن کامل

New pruning criteria for efficient decoding

In large vocabulary continuous speech recognizers the search space needs to be constrained efficiently to make the recognition task feasible. Beam pruning and restricting the number of active paths are the most widely applied techniques for this. In this paper, we present three additional pruning criteria, which can be used to further limit the search space. These new criteria take into account...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005